Bangla/English Script Identification Based on Analysis of Connected Component Profiles

نویسندگان

  • Lijun Zhou
  • Yue Lu
  • Chew Lim Tan
چکیده

Script identification is required for a multilingual OCR system. In this paper, we present a novel and efficient technique for Bangla/English script identification with applications to the destination address block of Bangladesh envelope images. The proposed approach is based upon the analysis of connected component profiles extracted from the destination address block images, however, it does not place any emphasis on the information provided by individual characters themselves and does not require any character/line segmentation. Experimental results demonstrate that the proposed technique is capable of identifying Bangla/English scripts on the real Bangladesh postal images.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-script Off-line Signature Verification: A Two Stage Approach

Signature identification and verification are of great importance in authentication systems. The purpose of this paper is to introduce an experimental contribution in the direction of multi-script off-line signature identification and verification using a novel technique involving off-line English, Hindi (Devnagari) and Bangla (Bengali) signatures. In the first evaluation stage of the proposed ...

متن کامل

An improved offline handwritten character segmentation algorithm for Bangla script

Effective segmentation of offline handwritten word images of unconstrained handwritten Bangla script is a challenging problem in Optical Character Recognition (OCR) application. Presence of a continuous horizontal line called ‘Matra’ is an important feature of this script. However, in unconstrained cursive handwriting, Matra can be wavy or discontinuous, makes the problem of segmentation diffic...

متن کامل

Word level Script Identification from Bangla and Devanagri Handwritten Texts mixed with Roman Script

India is a multi-lingual country where Roman script is often used alongside different Indic scripts in a text document. To develop a script specific handwritten Optical Character Recognition (OCR) system, it is therefore necessary to identify the scripts of handwritten text correctly. In this paper, we present a system, which automatically separates the scripts of handwritten words from a docum...

متن کامل

Convolution Based Technique for Indic Script Identification from Handwritten Document Images

Determination of script type of document image is a complex real life problem for a multi-script country like India, where 23 official languages (including English) are present and 13 different scripts are used to write them. Including English and Roman those count become 23 and 13 respectively. The problem becomes more challenging when handwritten documents are considered. In this paper an app...

متن کامل

Script Identification from Bilingual Gujarati-English Documents

In a multi-lingual country like India, in most of the official papers, school text books, magazines, it is observed that English words intersperse within the Indian regional languages. So a bilingual Optical Character Recognition (OCR) system is needed which can recognize these bilingual documents and store it for future use. In this paper authors present an OCR system developed for the script ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006